R4DS 15 - Functions
The codes below are from the practice exercises in https://r4ds.had.co.nz/, and are taken with reference from: https://jrnold.github.io/r4ds-exercise-solutions/
Loading tidyverse package.
Why are functions important?
# rnorm - random generation for normal distribution
df <- tibble(
a = rnorm(10),
b = rnorm(10),
c = rnorm(10),
d = rnorm(10)
)
df
# A tibble: 10 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 2.33 -0.557 0.889 -1.45
2 -0.650 0.272 -1.92 -0.0814
3 -0.647 0.299 1.10 0.243
4 0.586 0.373 0.175 0.845
5 -0.953 -1.04 -0.410 -1.86
6 -0.293 -1.52 0.888 -1.29
7 0.654 0.169 -0.766 -0.775
8 0.980 -0.423 -0.280 0.292
9 -1.96 0.162 0.0332 1.08
10 -0.182 -0.454 -0.667 0.753
# Manual coding
df$a <- (df$a - min(df$a, na.rm = T)/
(max(df$a, na.rm = T)) - min(df$a, na.rm = T))
df$b <- (df$b - min(df$b, na.rm = T)/
(max(df$b, na.rm = T)) - min(df$b, na.rm = T))
df$c <- (df$c - min(df$c, na.rm = T)/
(max(df$c, na.rm = T)) - min(df$c, na.rm = T))
df$d <- (df$d - min(df$d, na.rm = T)/
(max(df$d, na.rm = T)) - min(df$d, na.rm = T))
# How to reduce copying, pasting, and manual replacing?
# Identify the number of inputs:
# - 1 variable: a numeric vector
x <- df$a
(x-min(x, na.rm = T)/(max(x, na.rm = T) - min(x, na.rm = T)))
[1] 4.9317569 1.9547736 1.9578164 3.1905751 1.6515770 2.3117706
[7] 3.2592569 3.5847952 0.6455328 2.4224042
range <- range(x, na.rm = T)
range # good practice to give names to intermediate calculations
[1] 0.8419688 5.1281929
# After trying out with a simple input,
# Now you can turn it into a function:
# a. identify the name of the function
# b. list the inputs: function (input variable)
# c: place the code into the body of the function
rescale01 <- function(x) {
range <- range(x, na.rm = T)
(x - range[1])/(range[2] - range[1])
}
rescale01(c(0,5,10))
[1] 0.0 0.5 1.0
# What if there are Inf values?
x <- c(1:10, Inf)
x
[1] 1 2 3 4 5 6 7 8 9 10 Inf
rescale01(x) # error: NaN
[1] 0 0 0 0 0 0 0 0 0 0 NaN
# Let's fix the function
rescale01_inf <- function(x) {
range <- range(x, na.rm = T, finite = T)
(x - range[1])/(range[2] - range[1])
}
rescale01_inf(x)
[1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
[7] 0.6666667 0.7777778 0.8888889 1.0000000 Inf
# What if you want to map -Inf to 0, and Inf to 1?
range <- range(x, na.rm = T, finite = T)
y <- (x - range[1])/(range[2] - range[1])
y[y ==-Inf] <- 0
y[y ==Inf] <- 1
y
[1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
[7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000
# put into function
rescale01_inf_b <- function(x) {
range <- range(x, na.rm = T, finite = T)
y <- (x - range[1])/(range[2] - range[1])
y[y==-Inf] <- 0
y[y==Inf] <- 1
y
}
rescale01_inf(x)
[1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
[7] 0.6666667 0.7777778 0.8888889 1.0000000 Inf
rescale01_inf_b(x)
[1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556
[7] 0.6666667 0.7777778 0.8888889 1.0000000 1.0000000
Practice turning the following code snippets into functions
# to calculate the proportion of na values
x <- c(0, 1, 2, NA, 4, NA)
mean(is.na(x)) # number of NA as proportion
[1] 0.3333333
[1] 0.3333333
# to standardize the vector so that it sums to 1
x/sum(x, na.rm = T)
[1] 0.0000000 0.1428571 0.2857143 NA 0.5714286 NA
# write the function
sum_to_one <- function(x, na.rm = F){
x/sum(x, na.rm = na.rm)
}
sum_to_one(1:5)
[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333
sum_to_one(c(1:5, NA))
[1] NA NA NA NA NA NA
sum_to_one(c(1:5, NA), na.rm = T)
[1] 0.06666667 0.13333333 0.20000000 0.26666667 0.33333333 NA
[1] 0.9759001
calc_coefficent_variation <- function(x, na.rm = F){
sd(x, na.rm = na.rm)/ mean(x,na.rm = na.rm)
}
calc_coefficent_variation(1:5)
[1] 0.5270463
Compute the sample variance
variance <- function(x, na.rm = T){
n <- length(x)
m <- mean(x, na.rm = T)
sq_err = (x - m)^2
sum(sq_err)/n-1
}
var(1:10)
[1] 9.166667
Compute the skewness
skewness <- function(x, na.rm = F) {
n <- length(x)
m <- mean(x, na.rm = na.rm)
v <- var(x, na.rm = na.rm)
sum((x-m)^3 / (n-2)) / v^(3/2)
}
skewness(c(1,2,5,100))
[1] 1.494554
Write a function: both_na(), that takes two vectors of the same length and returns the number of positions that have an NA in both vectors.
x <- c(1:10, NA)
x
[1] 1 2 3 4 5 6 7 8 9 10 NA
y <- c(1:10, NA)
y
[1] 1 2 3 4 5 6 7 8 9 10 NA
[1] 1
# write the function
both_na <- function(x, y) {
sum(is.na(x) & is.na(y))
}
both_na(
c(NA, 1,2,4),
c(NA, NA, 1, 4)
)
[1] 1
Functions aren’t as daunting as I thought. It can be simplified into a step-by-step manner. First, know what you want to automate from the function Identify the input variables Try out a code Write a function for the code and give it a proper name Even better, compile it into a package for your future use.
https://jrnold.github.io/r4ds-exercise-solutions/
For attribution, please cite this work as
lruolin (2021, May 25). pRactice corner: Functions. Retrieved from https://lruolin.github.io/myBlog/posts/20210525_Tidyverse Chap 15 - Functions/
BibTeX citation
@misc{lruolin2021functions, author = {lruolin, }, title = {pRactice corner: Functions}, url = {https://lruolin.github.io/myBlog/posts/20210525_Tidyverse Chap 15 - Functions/}, year = {2021} }